Computational tools for protein-DNA interactions
نویسندگان
چکیده
Interactions between DNA and proteins are central to living systems, and characterizing how and when they occur would greatly enhance our understanding of working genomes. We review the different computational problems associated with protein-DNA interactions and the various methods used to solve them. A wide range of topics is covered including physics-based models for direct and indirect recognition, identification of transcription factor binding sites, and methods to predict DNA-binding proteins. Our goal is to introduce this important problem domain to data mining researchers by identifying the key issues and challenges inherent to the area as well as provide directions for fruitful future research. Interactions between deoxyribonucleic acid (DNA) and proteins are widely recognized as central to living systems. These interactions come in a variety of forms including repair of damaged DNA and transcription of genes into RNA. More recently it has been found that, by binding to certain DNA segments, proteins can promote or repress the transcription of genes in the vicinity of the binding site. Proteins of this kind are referred to as transcription factors (TFs). The number of TFs in an organism appears to be related to the complexity of the underlying genome: as the number of of genes increases, the number of TFs increases according to a power law [1]. This many-fold increase of TFs appears to be required in order to manage transcription in higher organisms. Characterizing how and when protein-DNA interactions occur would greatly enhance our understanding of the genome at work. A full picture of the interactions will eventually allow characterization of which genes are transcribed at any given time in order for the organism to react dynamically to a changing environment. Protein-DNA interactions are studied both in the wet lab and computationally. Here a synergy exists: lab experiments provide data and problems for computational methods to solve while computation provides hypotheses which guide additional lab experiments. The goal of this article is to review three major areas of interest for computational studies of protein DNA interactions: (1) physics-based studies of protein-DNA interaction, (2) identification of transcription factor binding sites, and (3) identification of DNA-binding proteins. How Many Binding Proteins Exist? Accounts of how many DNA-binding proteins exist vary through the literature. Attention is particularly focused on transcription factors. Older sources estimated that 2-3% of a prokaryotic genome and 6-7% of a eukaryotic genome encodes DNA-binding proteins [2]. This number was taken from the automatic gene annotation tool PEDANT [3]. Though contemporary estimates of the number of transcription factors range as high as 10% of all mammalian genes [4], averaging across genomes in the DBD database [5] classifies 4.65% of Metazoan (animal) genes as transcription factors (806 genes per animal genome). [1]. According to gene ontology annotations in PEDANT, there are currently 1714 genes in the human genome identified as coding for DNA-binding proteins with 885 of them identified as
منابع مشابه
Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملتخمین مکان نواحی کدکننده پروتئین در توالی عددی DNA با استفاده پنجره با طول متغیر بر مبنای منحنی سه بعدی Z
In recent years, estimation of protein-coding regions in numerical deoxyribonucleic acid (DNA) sequences using signal processing tools has been a challenging issue in bioinformatics, owing to their 3-base periodicity. Several digital signal processing (DSP) tools have been applied in order to Identify the task and concentrated on assigning numerical values to the symbolic DNA sequence, then app...
متن کاملComputational Tools for Investigating RNA-Protein Interaction Partners
RNA-protein interactions are important in a wide variety of cellular and developmental processes. Recently, high-throughput experiments have begun to provide valuable information about RNA partners and binding sites for many RNA-binding proteins (RBPs), but these experiments are expensive and time consuming. Thus, computational methods for predicting RNA-Protein interactions (RPIs) can be valua...
متن کاملIn silico investigation of lactoferrin protein characterizations for the prediction of anti-microbial properties
Lactoferrin (Lf) is an iron-binding multi-functional glycoprotein which has numerous physiological functions such as iron transportation, anti-microbial activity and immune response. In this study, different in silico approaches were exploited to investigate Lf protein properties in a number of mammalian species. Results showed that the iron-binding site, DNA and RNA-binding sites, signal pepti...
متن کاملاثرات متقابل ژن- ماده مغذی در بروز سرطان؛ یک مطالعه مروری سیستماتیک
--Advances in molecular biology over the past decades have contributed to a profound understanding of the function of genes in the development of diseases. The environment and nutritional factors interact with the genetic background of subject results in development of various diseases including cancer, cardiovascular disease and degenerative nervous disorders. However, the exact mechanisms o...
متن کاملProtein-DNA Binding: Discovering Motifs and Distinguishing Direct From Indirect Interactions
Computer Science) Protein-DNA Binding: Discovering Motifs and Distinguishing Direct From Indirect Interactions by Raluca M. Gordân Department of Computer Science Duke University Date: Approved: Alexander J. Hartemink, Advisor Uwe Ohler Bruce R. Donald David M. MacAlpine An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery
دوره 2 شماره
صفحات -
تاریخ انتشار 2012